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• <-H Phylogenetic networks provide a way to describe and visualize evolutionary histories that have undergone so- 

called reticulate evolutionary events such as recombination, hybridization or horizontal gene transfer. The level 
—- h k of a network determines how non-treelike the evolution can be, with level-0 networks being trees. We study 

1 1 the problem of constructing level-fc phylogenetic networks from triplets, i.e. phylogenetic trees for three leaves 

(taxa). We give, for each k, a level-fc network that is uniquely defined by its triplets. We demonstrate the 
ff) applicability of this result by using it to prove that (1) for all k > 1 it is NP-hard to construct a level-fc network 

^ consistent with all input triplets, and (2) for all k > it is NP-hard to construct a level-fc network consistent 

with a maximum number of input triplets, even when the input is dense. As a response to this intractability we 
give an exact algorithm for constructing level-1 networks consistent with a maximum number of input triplets. 

o\ 

Keywords: Phylogenetic networks; NP-hardness; exact algorithms. 

<N 

1. Introduction 

A central problem in biology is to accurately reconstruct plausible evolutionary histories. This area 
of research is called phylogenetics and provides fascinating challenges for both biologists and mathe- 
maticians. Throughout most of the history of phylogenetics researchers have concentrated on construct- 
ing phylogenetic trees. In recent years however, more and more attention is devoted to phylogenetic 
networks. From a biological point of view, networks are able to explain and visualize more complex 
evolutionary scenarios, since they take into account biological phenomena that cannot be displayed in 
a tree. These phenomena are so-called reticulate evolutionary events such as hybridization, recombina- 
tion and horizontal gene transfer. From a mathematical point of view however, phylogenetic networks 
pose formidable challenges. Irrespective of the exact model being used, many problems that are com- 
putationally tractable for trees (i.e. solvable in polynomial time) become intractable (NP-hard) for 
networks. Huson and Bryant wrote a detailed discussion of phylogenetic networks and their application 
. Here we study the level of networks, which restricts how interwoven the reticulations can be. In 
trees (i.e. level-0 networks) no reticulation events occur; in level-1 networks all reticulation cycles must 
be disjoint. The higher the level of the network, the more freedom in reticulation is allowed. Formally, 
a level-fc network is a phylogenetic network in which each biconnected component contains at most k 
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reticulation events. Level-1 networks have also been called galled trees 11 , gt-networks 19 and galled 
networks 15 . General level-/c networks were first introduced by Choy, Jansson, Sadakane and Sung 8 . 
The focus on level (as opposed to, for example, minimizing the total number of reticulation vertices) 
is motivated by several factors. Firstly, level induces a hierarchy on the space of networks with lower 
level networks being more 'tree-like' than higher-level networks. Identifying the position of candidate 
solutions (i.e. networks) within this hierarchy, or finding the minimum level at which candidate solutions 
exist, communicates important structural information about the solution space. (Level minimization, 
which derives its legitimacy from the parsimony principle, can also be used in an implicit context e.g. to 
measure the accuracy of input data. For example, if we expect the solution to be a tree, but only obtain 
higher level networks, this suggests that data errors lie in the regions corresponding to the biconnected 
components.) Secondly, from an algorithmic/mathematical perspective focussing on lower-level net- 
works can yield corresponding improvements in tractability/running time and to clearer mathematical 
analysis. Finally, restricting level is for many optimization criteria necessary to avoid trivial solutions, 
e.g. several of the problems we discuss in this article can be trivially optimized if we choose a solution 
with high enough level, but (as we shall see) this communicates no useful information. 

A great variety of approaches have been proposed for phylogenetic reconstruction. They include 
methods like Maximum Parsimony, Maximum Likelihood, quartet-based methods, Bayesian methods 
(using Monte Carlo Markov Chain), distance-based methods (using e.g. Neighbor Joining or UPGMA) 
and many others. They all have their advantages and drawbacks 5 > 12 > 17 > 24 . 

In this article we consider a triplet-based approach to construct directed phylogenetic networks. As 
input we take a collection of triplets, which are rooted phylogenetic trees on size-3 subsets of the taxa. 
These triplets can, for example, be constructed by methods such as Maximum Parsimony or Maximum 
Likelihood, that work accurately and fast for small numbers of taxa. Another possibility is to infer 
the triplets from a set of phylogenetic trees, possibly originating from different sources. However the 
triplets are obtained, the next step is to combine them into a single, large phylogenetic network for all 
taxa. Designing algorithms for the latter task forms the subject of this article. Triplet methods have 
become popular since they allow us to solve certain problems in polynomial time, as will be elaborated 
on shortly. Next to that, an advantage of these methods is that they provide the possibility to combine 
different sorts of biological data. 

Triplet-based methods have been extensively studied in the literature. Aho et al. 1 gave a polynomial- 
time algorithm that constructs a tree from triplets if there exists a tree that is consistent with all 
input triplets. This positive result provided the stimulus for studying the applicability of triplet-based 
methods to networks. Unfortunately, it has been shown that for level-1 15 and level-2 28 ' 27 networks the 
corresponding problem becomes NP-hard. However, the same articles give polynomial-time algorithms 
for the problem where the input is dense, i.e. there is at least one triplet in the input for every size- 
3 subset of the taxa. A related problem that accommodates errors in the triplets is finding a tree 
consistent with as many input triplets as possible. This problem is NP-hard 4 > 14 > 29 ; and approximation 
algorithms have been explored both for the construction of trees 10 and level-1 networks 6,ls . For the 
construction of trees, efficient heuristics have been designed by Semple and Steel 23 , Page 20 , Wu 29 
and Snir and Rao 25 . The last algorithm (MAX CUT triplets) outperforms the character-based method 
Matrix Representation with Parsimony (MRP), which is popular in practice 3 ' 21 - 22 . 

In this paper we study the structure and construction of level- k networks. First, we analyze the 
minimum level k ensuring that for each input triplet set on n leaves there exists a level-k network 
consistent with all triplets (in Sect. 3). Then we use this analysis to give, for each k, a level-fc network 
that is uniquely defined by the set of triplets it is consistent with, i.e. no other level-A: network is 
consistent with that set of triplets (in Sect. 4). These networks we use to give two NP-hardness results 
in Sect. 5. We prove that constructing level-fc networks consistent with all input triplets is NP-hard for 
every k > 1. This complements the known results for k S {1, 2} (see above), of which our result for k > 2 
is a non-trivial generalization. In addition, we show that constructing a level-A; network consistent with 
a maximum number of input triplets is NP-hard for all k > 0, even if the input triplet set is dense. This 
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Figure 1. (a) One of three possible triplets on leaves x, y, z and (b) an example of a level-1 network. As with all figures in 
this article, all arcs are directed downwards. 

means that it is even NP-hard to construct a phylogenetic tree consistent with a maximum number of 
triplets from a dense triplet set. We respond to the aforementioned intractability results with an exact 
algorithm for constructing level-1 phylogenetic networks in Sect. 6. This algorithm runs in time 0(mA n ) 
(for n leaves and m triplets) and can also be used for the weighted version of the problem. Authors 
working on the unrooted analogue of triplets, quartets, have noted that their methods are particularly 
powerful when the input quartets are chosen carefully (and are, for example, not forced to contain 
information for each quadruple of leaves) 26 . The level-1 algorithm we present can tolerate such inputs 
(i.e. non-dense input sets) and for this reason we are optimistic about the biological relevance of the 
solutions it produces. We conclude with a discussion of open problems. 

2. Preliminaries 

A phylogenetic network (network for short) is defined as a directed acyclic graph in which a single vertex 
has indegree and outdegree 2 (the root) and all other vertices have either indegree 1 and outdegree 2 
(split vertices), indegree 2 and outdegree 1 (reticulation vertices) or indegree 1 and outdegree (leaves), 
where the leaves are distinctly labeled. Let L(N) denote the set of leaves of a network N. 

A directed acyclic graph is connected (also "weakly connected" ) if there is an undirected path 
between any two vertices, and biconnected if it contains no vertex whose removal disconnects the graph. 
A biconnected component of a network is a maximal biconnected subgraph and is called trivial if it is 
equal to two vertices connected by an arc. Otherwise, it is non-trivial. An arc a = (u, v) of a network 
N is a cut-arc if its removal disconnects N; it is trivial if v is a leaf and otherwise non-trivial. A vertex 
w is below an arc a = (u, v) (and below vertex v) if there is a directed path from v to w. 

Definition 1. A network is said to be a level-k network if each biconnected component contains at 
most k reticulation vertices. 

To avoid "redundant" networks, we require every non-trivial biconnected component of a network to 
have at least three outgoing arcs. A level-fc network is a strict level-fc network if it is not a level-(fc — 1) 
network. The class of level-0 networks are phylogenetic trees (trees for short); they have no reticulation 
vertices. 

A triplet xy\z is a tree on leaves x, y, z such that the lowest common ancestor of x and y is a proper 
descendant of the lowest common ancestor of x and z, see Fig. 1(a). The leaves of triplet t form the 
set L(t). A set T of triplets has leaf set L(T) = \J teT L(t), with size n = L(T). For L' C L(T) denote 
by T\y the set of triplets t € T with L(t) C L' . A set T of triplets is dense if it contains at least one 
triplet for each sizc-3 subset of L(T). 

Definition 2. A triplet xy\z is consistent with a network TV (interchangeably: N is consistent with 
xy\z) if N contains a subdivision of xy\z, i.e. if N contains vertices u ^ v and pairwise internally 
vertex-disjoint paths m — > x, u —> y, v ^ u and v — > z. 
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Figure 2. The unique simple level-1 generator and the four simple level-2 generators. 

By extension, a set T of triplets is consistent with N (interchangeably: N is consistent with T) if 
every triplet in T is consistent with N and L(T) = L(N). For example, Fig. 1(b) is a level-1 network 
with two non-trivial biconnected components, each of them containing one reticulation vertex. This 
network is consistent with (amongst others) the triplets bc\a, bd\h, hd\b and gi\k, but is not consistent 
with dg\k, ab\c, gk\i or cd\f. 

We introduce the class of simple level-fc networks. Intuitively, these are the basic building blocks 
of level- A; networks in the sense that each non-trivial biconnected component of a level-fc network is in 
essence a simple level-? network, for some I < k. These simple networks will be built by adding leaves 
to "generators", which are formally defined as follows. 

Definition 3. A simple level-k generator, for k > 1, is a directed acyclic biconnected multigraph, 
which has a single root (a vertex with indegree and outdegree 2), precisely k reticulation vertices 
(with indegree 2 and outdegree at most 1) and apart from that only split vertices (with indegree 1 and 
outdegree 2). 

A case analysis shows that there is only one simple level-1 generator and that there are four simple 
level-2 generators 27 , depicted in Fig. 2. Computer calculations revealed the 65 simple level-3 generators 
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Definition 4. A simple level-fc network, for fc > 1, is a network obtained by applying the following 
transformation to some simple level- fc generator G: 

(1) first, for each pair u, v of vertices in G connected by a single arc (u, v), replace (u, v) by a path with 
£ > internal vertices and for each such internal vertex w add a new leaf x and an arc (w, x); 

(2) second, for each pair u, v of vertices in G connected by multiple arcs replace one such arc by a path 
with at least one internal vertex and for each such internal vertex w add a new leaf x and an arc 
(y, x); and treat the other arc between u, v as in step (1); 

(3) third, for each vertex v of G with indegree 2 and outdegree add a new leaf y and an arc (v, y). 

We remark that at least three leaves have to be added to G, to avoid redundancy of the constructed 
network. A network is simple if it is a simple level-fc network for some fc. There is an elegant character- 
isation of simple level-fc networks: 

Lemma 1 (Van Iersel et al. 27 ). For k > 1, a network N is a simple level-k network if and only if 
N is a strict level-k network and every cut-arc is trivial. 

In our proofs we will frequently remove leaves from a network. This might result in a graph that 
is not a valid network. Therefore, we define tidying up a directed acyclic graph as repeatedly applying 
the following four steps: (1) delete unlabeled vertices with outdegree 0; (2) suppress vertices with 
indegree and outdegree 1; (3) replace multiple arcs by single arcs and (4) replace nontrivial biconnected 
components with at most two outgoing arcs by a single vertex. Observe that if N' is the result of 
removing leaves V from network N and tidying up the resulting graph, then TV' is a valid network. In 
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Figure 3. The level-(n — 1) network Np(n) is consistent with every triplet set on n leaves, and has the minimum number 
of arcs and vertices among all such networks. 

addition, observe that in this case N' is consistent with exactly the same triplets as N is, except for 
triplets containing leaves from L' . 

3. Sufficiency and Necessity of Network Level 

In this section we prove that any triplet set on n leaves is consistent with a level- (n — 1) network. Then 
we show that this bound is tight by giving a triplet set on n leaves that is not consistent with any 
network of level smaller than n — 1 . 

Let Tp{n) be the set of all 3(g) triplets possible on n leaves. Call Tp(n) the full triplet set on n 
leaves. 

Proposition 1. For any triplet set T on n leaves there exists a level-(n — 1) network consistent with 
T. 

Proof. Let n > 3 and let Np(ri) be the network in Fig. 3. First, look at triplets XhXi\xj G Tp(n) with 
h,i 7^ n. There exists a unique split vertex v below the left child of the root, from which there are two 
paths to Xh and Xi that have only v in common. On the other hand, there is a path from the root to 
Xj, via the right child of the root. So the network is consistent with XhXi\xj. 

Second, look at triplets XiX n \xj € Tp(n). There exists a unique split vertex v below the right child 
of the root, from which there are two paths to x± and x n that have only v in common. As there is also 
a path from the root to xj via the left child of the root, the network is consistent with XiX n \xj. Given 
that T C T F (n), the result follows. □ 

Lemma 2. Any network consistent with the full triplet set must be simple. 

Proof. Let n > 3 and let N be consistent with Tp(n). If N is not simple then, by Lemma 1, it contains 
a non-trivial cut-arc a — (it, v). If there is only one leaf below a, then N is not a valid network because 
it contains a biconnected component with only one outgoing arc, which we do not allow. If all leaves 
are below a, then again N is not a valid network because it contains a biconnected component with 
only one outgoing arc. Hence there are leaves x and y below a and a leaf z not below a. This implies 
that the triplet xz\y is not consistent with N, a contradiction. □ 



Let a reticulation leaf be defined as a leaf whose parent is a reticulation vertex. Simple level- fc networks 
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have, by definition, at least one and at most fc reticulation leaves (fc > 1). 

Proposition 2. The full triplet set on n > 3 leaves is not consistent with any level-k network with 
k < n — 1. 

Proof. The proof is by induction on n. The theorem holds for n = 3: any network consistent with 
Tp(3) must be simple by Lemma 2, and any simple level- 1 network on three leaves is consistent with 
only two triplets. Since there are three possible triplets on a set of three leaves, there is no level-1 
network consistent with Tp(3). Let n > 3; the induction hypothesis is that for all n' < n the full triplet 
set Tp(n') is not consistent with any level-fc' network for k' < n' — 1. Suppose for contradiction that 
the theorem does not hold for n, thence there exists a level-fc network N consistent with Tp(n) and 
k < n — 1. By Lemma 2, TV must be a simple level- fc network and thus contains a reticulation leaf x. 
Delete x and tidy up the resulting graph. This decreases the level of the network since the parent of x 
is a reticulation vertex and gets removed when tidying up the graph. This thus yields a level- (n — 3) 
network consistent with Tp(n — 1), contradicting the induction hypothesis. We thus conclude that there 
exists no level- fc network consistent with Tp(n) for k < n — 1. □ 

The network Np(n) of Fig. 3 is much smaller than the network proposed by Jansson and Sung 16 , 
that is consistent with Tp(n) and was obtained from a complicated sorting network. 

Lemma 3. For n > 3, the network Np(n) has the minimum number of arcs and vertices over all 
networks consistent with the full triplet set Tp(n). 

Proof. We first show that any simple level- fc network N = (V, A) on n leaves has 2n + 2k — 1 vertices 
and 2n + 3fc — 2 arcs. Let s be the number of split vertices. The sum of the indegrees of all vertices 
is s + 2fc + n, while the sum of their outdegrees is 2 + 2s + fc. It is well known that in any directed 
graph the sum of all outdegrees equals the sum of all indegrees. It follows that s — n + k — 2. Using this 
formula we obtain that the total number of vertices equals: 

|V r | = s + fc + n+ l = (n+fc-2) + fc + n+ l = 2n + 2fc-l . 

Split vertices and reticulation vertices have total degree 3, leaves have total degree 1, and the root of 
N has total degree 2. Thus the total number of arcs in N is: 

..,3s + 3fc + n + 2 _ 3(n + fc - 2) + 3fc + n + 2 

\A\ — - — - — In + 6k — 2 . 

Let N n be a network consistent with Tp(n). By Lemma 2 and Proposition 2, N n is simple and has 
level at least n — 1. Then the calculation above yields that N n has at least 2n+2(n— 1) — 1 — An — 3 
vertices and 2n + 3(n — 1) — 2 = 5(n — 1) arcs. The proof is complete by noting that Np(n) has exactly 
An — 3 vertices and 5(n — 1) arcs. □ 



4. A Unique Level- fc Network 

In the construction and analysis of triplet methods, it is often important to know that a certain network 
is uniquely defined by a set of triplets. Characterizing such networks is an important open problem. In 
this section we present a partial solution to this question by giving, for each fc, a level-fc network iV fc 
that is unique in the sense that it is the only level-fc network that is consistent with all triplets that 
are consistent with N k . In the next section we will demonstrate how useful this unique network is, by 
using it to show the intractability of constructing level-fc networks from triplets. 

Let N k be the network to the left in Fig. 4 and let T k be the set of triplets that are consistent with 
N k . By hanging leaves x%, . . . ,x p on an arc (u, v) (for some p > 1) we mean replacing (u, v) by a path 
u, wi, . . . , w p , v and adding arcs (u>i, Xi) for all i = 1, . . . ,p. 

Theorem 1. For each k > 2, the network N k is the unique level-k network consistent with T k . 
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Figure 4. Reconstruction of N k from N' . After adding 35^—4 and S5i— 3 to N' we obtain network N" . After also adding 
S5; we obtain N'" . Finally, after adding ssi_2 and ssi_i we obtain the original network N k to the left again. 



Proof. Let R be the set of reticulation leaves of N k , that is R = {s$, sio, • ■ ■ , S5fc-5, S5fc-2}- We start 
by proving the following claims. 

Claim 1. Any level-k network consistent with T k is a simple level-k network. 

Proof of Claim 1. First observe that all triplets over the leaves RU {ssfc-4} are in T k . Let N be a 
level-fc network consistent with T k . From Proposition 2 it follows that N is a strict level-k network. Now 
suppose for contradiction that N is not simple. Then by Lemma 1, N contains a non-trivial cut-arc a. 
Let B C L(N) be the set of leaves below a and let A — L(N) \ B. Because a is non-trivial, B contains 
at least two leaves. For every two leaves x, y in B and every leaf z in A, there is only one triplet in T k 
on leaves x,y,z that is consistent with N. However, for s^-2 there are no two leaves x',y' such that 
there is only one triplet in T k with leaves s$k-2, x ' , y' ■ It follows that belongs to neither A nor 

B, a contradiction. □ 

Claim 2. In any level-k network consistent with T k , at least one of the leaves in R is a reticulation 
leaf. 

Proof of Claim 2. Let N be a network consistent with T k . Recall that T k contains all possible triplets 
over leaves i?U{s5fe_4}. Proposition 2 says that any network consistent with all triplets over RUls^k-4} 
cannot have level smaller than k, so N is a strict level-fc network. By Claim 1, N is a simple level-A: 
network and hence contains a reticulation leaf x. First suppose x does not belong to RU {s^-^}. Then 
removing x and tidying up the resulting graph yields a level-(fc — 1) network consistent with all triplets 
over R U {s5fc_4}. A contradiction, thus N contains no leaves outside R U {s^-4} as reticulation leaf. 
Symmetrically, no leaf outside R U {s^ks} is a reticulation leaf of N. It follows that only leaves from 
R can be reticulation leaves of N, so x belongs to R. □ 



We are now ready to prove the theorem. The proof is by induction on k\ the base case k - 
proven by Van Iersel et al. 27 . Let k > 2 and assume the theorem holds for all k! = 2, . 



2 has been 
In 
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the induction step, we will show that any level-fc network consistent with T k and with reticulation leaf 
(for any i £ {1, . . . , k — 1}), equals the network N k . The case that is a reticulation leaf is 

symmetric to the case that s§k~5 is a reticulation leaf. Since by Claim 2 at least one leaf from R must 
be a reticulation leaf, the theorem will follow. 

Let N be a simple lcvcl-fc network consistent with T k and with reticulation leaf s§i (with i £ 
{1, . , . ,k — 1}). Let T" be the triplet set obtained from T k by removing all triplets containing some leaf 
from {s5i_4, . . . , S5i}, i.e. T' — T |(£\/ S5i _ 4 ,... >S5j })- Then T" is consistent with network N', the second 
network from the left in Fig. 4. Because T' equals the set of all triplets that are consistent with N' 
(which is a relabeling of N k ~ 1 ), by the induction hypothesis N' is the unique level-(fc — 1) network 
consistent with T'. 

Consider the network obtained from N by removing the leaves S5i-i, s$i-2, S5i, 55^-3 and ssi_4 (in 
this order) from N and tidying up the resulting graph. This decreases the level of the network, since the 
parent of s$i was a reticulation vertex and gets removed when tidying up the graph. Hence this gives a 
level-(fc — 1) network consistent with T' , which by the induction hypothesis equals N' . 

To show that N equals N k , consider the network N' and apply the reverse of the operation that 
removed the leaves ssi-i, 35^-2, S5i, and from N. This process is illustrated in Fig. 4, and we 
will show that the such obtained network will equal N k . Process the leaves in reverse order, so add 
to N' first. Since N' has k — 1 reticulation leaves and S5; also has to become a reticulation leaf, s$i-4 
must be a leaf below a split vertex. Hence 55^-4 is added to the network by hanging S5^_4 on some arc of 
N' . The same holds for .55^-3. Since s^i was a reticulation leaf in N, it is added to the network choosing 
two arcs (ui, i>i), (112, V2), subdividing them into (ui, u>i), (u>i, Vx) and (112, W2), (t£>2, V2), respectively, 
and adding a new reticulation vertex x and arcs (wi,x), (w2,x), (x, s§i). Subsequently, and s$i-i 

are added to the network by hanging them on arcs to be specified. It remains to determine which arcs 
to subdivide, as to add the leaves s^i-i, s$i-2, ssi, ssi_3 and 

First consider the case i > 1. Because 55^-455^+1 1 55^-4 and S5i-4S5i+i|ssj_7 are triplets in T k , it 
follows that S5i_4 is added to N' by hanging it on the arc entering the parent of Ssj+i. Symmetrically, 
S5i_3 is hung on the arc entering the parent of 55^+2- This leads to network N" in Fig. 4. Next we 
discuss how to add 55^ to network N". Triplets S5jS5i+i|ssi_4 and S5fc_4S5i+i|s5i force a subdivision of 
the arc between the parents of ssi_4 and S5i+i. For symmetric reasons, also the arc between the parents 
of S5i + 2 and ^5^-3 has to be subdivided. So subdivide these arcs and make s$i a reticulation leaf below 
them (as described in detail in the previous paragraph). This leads to the network N'" in Fig. 4. Now 
s 5i-2 and S5i_i can only be added to the network by hanging them on the arcs entering the parent of 
s 5l , since s 5i+1 s 5i - 2 \s5i, 55^5,-2^5^+1 G T k and s 5i+ 2S 5i -x\s 5i , S5i«5i-i|s5i+2 G T k . This leads to the 
leftmost network in Fig. 4, which is the network N k . 

The case i = 1 is slightly different, since a leaf S5i_7 does not exist. However, the triplets S5fc_4Sg|si 
and sqSi\st enforce that si = 55^-4 is added to N' by hanging it on the arc entering the parent of Sq. 
Symmetrically, S2 = S5i-3 must be hung on the arc entering the parent of S7. The same arguments as 
in the case i > 1 show how to add the leaves 8$i, sgj-i, S5,_2- Also in this case we obtain the network 
N k . 

It follows that N equals N k , completing the proof of Theorem 1 . □ 

For level 1, the network TV 1 is not the only network consistent with T 1 . Fig. 5(a) shows the three 
networks that are consistent with T 1 . However, there does exist a level- 1 network that is unique in this 
sense. It is not too difficult to argue that the network jV 1 * in Fig. 5(b) is the only level-1 network that 
is consistent with all triplets that are consistent with TV 1 *. For level 0, the only tree consistent with a 
single triplet is the triplet itself. 

5. From Uniqueness to Intractability of Constructing Level-fc Networks 

In this section we show how to use the unique networks from the previous section in the complexity 
analysis of network reconstruction methods from triplets. We demonstrate this in two NP-hardness 
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N 1 N 1 ' N 1 " N r 

(a) (b) 

Figure 5. (a) The three networks iV 1 ,^ 1 ' and TV 1 " that are consistent with T 1 = {s\S3\s2, S2S3\si} and (b) network 
N 1 , which is the unique network that is consistent with the set of triplets consistent with N 1 * . 

proofs. First, we show that it is NP-hard, for each k > 1, to decide whether a given triplet set is 
consistent with some level-fc network. Secondly, we show that the maximisation variant of this problem 
is NP-hard for each k > even for dense triplet sets. 

We start with the proof that it is NP-hard to construct a level-fc network consistent with all input 
triplets. Hardness was already known for k = 1 15 and k — 2 27 . Note that the uniqueness result from 
the previous section plays a crucial role in the subsequent NP-hardness proof for level k, and that the 
NP-hardness is not a consequence of the hardness for levels 1 and 2. 

In the proofs, we will often say we "hang" a leaf or "caterpillar" from a "side" of a simple level- 
k generator. A network is a caterpillar if deleting all leaves gives a directed path. In simple level-fc 
generators, a side is either an arc or a vertex with outdegree zero (cf. 27 ). Hanging a caterpillar from 
arc Si means subdividing Si and connecting the new vertex to the root of the caterpillar. Similarly 
defined is hanging a caterpillar from a vertex with outdegree zero, which gets connected to the root of 
the caterpillar. Hanging a leaf from a side is defined similarly. In addition, a leaf x is on side Si if there 
exists a cut-arc (u, v) such that u is on a subdivision of Si (if Si is an arc) or u is a reticulation vertex 
(if Si is a reticulation vertex), and there is a directed path from v to x (possibly v — x). A leaf x is said 
to hang between vertices w and q if there is a cut-arc (u, v) such that it is on a directed path from w to 
q and there is a directed path from v to x. 

Theorem 2. For each k > 2, it is NP-hard to decide whether for a triplet set T there exists some 
level-k network N consistent with T. 

Proof. Reduce from the following NP-hard problem 9 : 
Set Splitting 

Instance: A set U — {iti, . . . , u n } and a collection C — {C\, . . . , C m } of size-3 subsets of U. 
Question: Can U be partitioned into sets Ui and U2 (a set splitting) such that Cj % U\ and 
Cj % U 2 , for all 1 < j < to? 

From an instance (U, C) of Set Splitting construct a set T of triplets as follows. Start with triplet 
set T k (see previous Section), and for each set Cj — {u a ,Ub,u c } g C (with l<a<b<c<n) add 
triplets u J a S5\u J b , u^s^u^ and u J c s$\u J a . In addition, for every Ui £ U and 1 < j < to add triplets S5U^|si, 
S5ui\s2, s^sq\uI, s^s^ul and (if j ^ to) u\u 3 ^ \s$. This completes the construction of T. We will prove 
that T is consistent with some level- A: network if and only if there exists a set splitting {J/i,^} of 
(U,C). 

First suppose that there exists a set splitting {Ui, U2} of (U, C). Construct the network N by starting 
with the network N k , which is obtained from the simple level- fc generator G k in Fig. 6 by hanging a leaf 
Si on each side Si. For each element Ui £ U\, hang all leaves u\, . . . , u™ on side Si below the parent of 
si; for each clement Uj G U2 hang all leaves u\, . . . , u™ on side S2 below the parent of s 2 . To determine 
the order in which to put these leaves consider a set Cj = {u a , u c } G C. If u a and Uf, are in the same 
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S 5k -2 S 5k-2 

(a) The simple level-A; generator G k . (b) The network N. 

Figure 6. Auxiliary networks in the proof of Theorem 2. 

class of the partition, then put leaf u 3 a below u 3 b ; if Ub and u c are in the same class of the partition put u J b 
below vP c \ and if u a and u c are in the same class put u 3 c below vP a . The rest of the ordering is arbitrary. 
It is easy to check that N is consistent with all triplets in T. For an example of this construction see 
the network to the right in Fig. 6. 

Conversely, suppose that T is consistent with some level- k network N. Since T k C T, Theorem 1 
says that N must be equal to N k with the leaves not in L(N k ) added. Triplets s^ujlsi and s§ul\s2 
imply that none of the leaves u\ can hang between the root and s\, or between the root and S2- Further, 
triplets S5Se\u{ and s^\u{ imply that u{ must be on either side S± or 52- Triplets u{u{ +l \s^ yield 
that for each 1 < i < n, all leaves uj, . . . , u™ have to hang on the same side. For h £ {1, 2}, let Ut be 
the set of elements Ui £ U for which all leaves u\, . . . ,u™ hang on side Sh- It remains to prove that 
(Ui, Ui) is a set splitting of (U,C). Consider a set Cj = {u a ,Ub,u c } and suppose for contradiction that 
u a ,Ub,u c £ Uh for some h £ {1,2}. It follows that all leaves u 3 a7 u 3 b ,u 3 c hang between Sh and the root. 
This is impossible, as T contains triplets u^ss|u^, u^s^u^, u^s^u^. □ 

For dense triplet sets, it can be decided in polynomial time whether there exists a level- 1 15 or level- 2 
28,27 ne twork consistent with all input triplets. Using the uniqueness result from the previous section, 
we will prove that the maximization versions of these problems are NP-hard, even for dense triplet sets 
and for all k > 0. 

MaxCL-/c-Dense 

Instance: A dense triplet set T. 

Output: A level-A; network consistent with the maximum number of triplets in T that any level- A; 
network is consistent with. 



Theorem 3. The problem MaxCL-A;-Dense is NP-hard, for all k>0. 
Proof. Reduce from the following NP-hard problem 2 ' 7 . 
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Feedback Arc Set in Tournaments (FAST) 

Instance: A complete directed graph G — {V, A) and an integer q £ N. 
Question: Is there a set A' C ^4 of q arcs such that G' = (V,A \ A') is acyclic? 

For k = 0, we imitate the reduction of the non-dense case 4,29 . The difference is that the constructed 
instance of MaxCL-O-Dense contains more triplets, to become dense. Given an instance G = (V,A) 
and q £ N of FAST, construct an instance T of MaxCL-O-Dense as follows. Introduce a vertex 
x $ V and for each arc (z,y) £ A, add a triplet xy\z to T. In addition, for each combination of three 
leaves Vx,i>2,va £ V (thus fi,t>2,^3 a;), add all three triplets i>ii>2|«3, W1V3ID2 and V2Vs\vi to T. The 
differences with the reduction of Wu 29 are that (1) we reduce from FAST instead of FAS and that 
(2) we add all triplets containing three leaves from V. The combination of these two modifications 
makes the instances dense. The extra triplets do not change the reduction since any level-0 network is 
consistent with exactly one triplet for every combination of three leaves. The intuition of the reduction 
is as follows: the vertices of an acyclic graph can be uniquely labeled such that arcs point only from 
vertices with higher label to vertices with lower label. In a phylogenetic tree, this ordering of the vertices 
corresponds to an ordering of the leaves on the unique path from the tree root to leaf x. Along the lines 
of the proof for the non-dense case 4 ' 29 , it can be argued that G contains a feedback arc set of size q if 
and only if there exists a tree consistent with |T| — q — 2('P) triplets from T. This completes the proof 
that MaxCL-0 is NP-hard for dense triplet sets. 

For k > 2, use a similar reduction but start from the simple level-fc generator G k in Fig. 6(a). Use 
the following property of G k , implied by Theorem 1: Let N k be a network obtained by hanging a leaf 
from each side of G k . If T k is the triplet set consistent with N k , then N k is the unique level- k network 
consistent with T k . 

Given a tournament G = (V, A) and integer ggN, construct a corresponding instance T of MaxCL- 
fc-DENSE as follows. First construct a network N' from G k . From each side 5, of G k hang a caterpillar 
with leaves Sj, . . . , Sf, with p = 2(q + 2('P)) + 1. The intuition being that p is "large" to force a specific 
structure of the networks consistent with many triplets in T. For simplicity denote S^ k _ 2 by x. Hang 
|V| leaves on side Ssk-4, distinctly labeled by the vertices of V, between the root of the caterpillar and 
the reticulation vertex on that side, in arbitrary order. This gives the network N'. For an example, see 
the network on the right in Fig. 7. Let T" be the set of triplets consistent with N', except for triplets 
ab\c with a,c £V and b (£ V. For each arc (z,y) £ A, add a triplet xy\z to T', informally encoding the 
arc (z,y) as a constraint "z hangs between the root of the caterpillar and y" . Finally, for each 3-set 
of vertices from V add all three triplets over the three leaves labeled by the vertices, that are not yet 
present. Denote the resulting (dense) triplet set by T, which forms an instance of MAxCL-fc-DENSE. 
We will show that there exists a level-fc network N consistent with \T\ — q — 2('p) triplets from T if 
and only there exists a feedback arc set A' of size q. 

First suppose G has a feedback arc set A' of size q. Thus the graph G' = (V, A \ A') is acyclic, and 
each vertex v £V can receive a label f(v) such that there are no arcs (z, y) £ A \ A' with f(y) > f(z). 
Construct the network N from N' by rearranging the leaves from V by sorting them with respect to 
their labels such that the highest leaf has the largest label. For any arc (z,y) £ A \ A' it holds that 
f(y) < f( z ) an( i hence the triplet xy\z is consistent with N. For every vertex pair {z,y}, the triplet 
yz\x is consistent with N. For each combination of three leaves from V there is exactly one triplet over 
these leaves consistent with N. It follows that the only triplets in T that are not consistent with N are 
(1) the triplets corresponding to the arcs in A', and (2) exactly two-thirds of the triplets that have only 
leaves in V. That means that in total \T\ — q — 2('^') triplets from T are consistent with N. 

For the converse, suppose there exists some level-A; network N consistent with \T\ — q — 2(g) triplets 
from T. For all 1 < j < p, there exists a unique network with leaf set Lj = {Sf | 1 < i < 5k — 2} that is 
consistent with all triplets from Tj = T\^.. There are at most g + 2('Y') triplets not consistent with N, 
and the sets Tj are pairwise disjoint, so at least one of the sets Lj is placed on a simple level-A: network 
of type G k . Take any i and observe that for each j such that is not on side S 1 , of N, there exists 



12 




sg" 1 sg 

Figure 7. An example input G = (V, A) of FAST on the left and the network N constructed in the proof of Theorem 3, 
for k = 2, to the right. 

a triplet t G T that is not consistent with N and L(t) = {Sf,4,^ 2 } for £i,£ 2 £ {Sj,...,Sf}. If there 
would be more than q + 2('^') such j then there would be more than q + 2('^') distinct triplets from T 
not consistent with N. Hence for each i there are at least p' = q + 2('^') + 1 indices j such that Sf is 
on side S{. Let L\,. . . ,L*, be pairwise disjoint sets each containing exactly one leaf Sj that is on side 
Si, for each 1 < i < 5k — 2. 

The next claim is that all leaves labeled by vertices from V have to be on side 5*5/0-4, between the 
root of the caterpillar and the reticulation vertex on that side. Suppose for contradiction this were not 
the case for some leaf labeled by v € V. Then for each of the leaf sets L\ U {v}, . . . , L*, U {v} there exists 
a triplet in T not consistent with N. Since the sets LI, ... , L*, are pairwise disjoint and p' > q + 2('^'), 
we obtain a contradiction. 

Since the leaves corresponding to vertices from V all hang on the same side Ssk-4, they can be 
uniquely labeled by their order on side S , 5fe_4, such that the highest leaf has the largest label. If some 
leaves are below the same cut-arc, they receive the same label. Let A' be the set of arcs (z, y) corre- 
sponding to the triplets xy\z that are not consistent with N, and for every v G V let f(v) be the label 
of the leaf corresponding to v. Then the graph G' = (V,A\A') is acyclic, because all arcs (z,y) £ A\A' 
satisfy the relation f(y) < f(z). 

An example for fc = 2 is displayed in Fig. 7. The graph on the left is an example instance G = (V, A) 
of FAST. The arcs of G are encoded as triplets xw\u, xw\q, xu\v, xv\w, xu\q and xq\v. The network N 
to the right is consistent with all these triplets except xv\w. The arc (w, v) is indeed a feedback arc set 
of the graph G. Other triplets in T enforce this specific level-2 network N and make T dense. 

For k = 1 the same reduction as for k > 2 works, when hanging two caterpillars from side S\. □ 
6. An Exact Algorithm for Constructing Level- 1 Networks 

Given the intractability results from the previous section for constructing networks consistent with a 
maximum number of input triplets, there is no hope (unless P = NP) for algorithms solving these prob- 
lems exactly and in polynomial time. Still, these problems need to be solved in practice, so algorithms 
for MAxCL-fc-DENSE and its relaxation MAxCL-fc to general triplet sets are either not guaranteed 
to give an optimal solution, or require superpolynomial time. In this section we consider the latter 
approach. 
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Wu described an exact algorithm 29 that finds a tree consistent with a maximum number of input 
triplets in 0(3™(n 2 + m)) time, with m the number of triplets and n the number of leaves. We extend this 
approach for reconstructing evolutions that are not tree-like, but where reticulation cycles are disjoint. 
We do this by describing an exact algorithm that runs in 0(m4 n ) time and solves the MaxCL-1 
problem, which is NP-hard by Theorem 3. 

Note that the problem MaxCL-1 does not only ask if there exists a level- 1 network consistent with 
all input triplets; it asks us to find a level- 1 network that is consistent with a maximum number of 
them. Hence an algorithm for MaxCL-1 always outputs a solution, no matter how bad the data is the 
algorithm is confronted with. This contrasts with existing algorithms i* 28 - 16 that only find a solution if 
a network exists that is consistent with all triplets of a (dense) input. The algorithm described in this 
section is also more powerful in that it also works for non-dense triplet sets. It can thus be used even if 
for some combinations of three taxa it is difficult to find the right triplet, which is very likely to be the 
case in practice. The very same algorithm works for the weighted version of the problem. In addition, 
it can also be used to choose, among all level- 1 networks consistent with a maximum number of input 
triplets, a network with a minimum number of reticulation vertices. However, its exponential running 
time means that it can only be used for a relatively small number of leaves at a time. 

The intuition behind our algorithm is the following. There are three different shapes possible for the 
optimal network. Either the arcs leaving the root are cut-arcs, like in Fig. 10(b), or the root is part of a 
cycle, which can be "skew" like the cycle in Fig. 11(a) or "non-skew" like in Fig. 11(b). We can try to 
construct a network of each type separately. Given the tripartition (X' ,Y' , Z') or bipartition (X',Y') 
of the leaves indicated in the figures, it turns out to be possible to reconstruct the optimal network 
by combining optimal smaller networks for X', Y' , X' U Z' and Y' U Z' . Critical is that these smaller 
networks for X' U Z' and Y' U Z' must be such that combining the different networks does not create 
biconnected components with more than one reticulation vertex. 

To achieve this, we introduce the notion of "non-cycle-reachable" -arc, or n.c.r.-arc for short. An arc 
a = (u,v) is an n.c.r.-arc if there is no directed path of length at least one from any vertex w in an 
(undirected) cycle to u. Also, for some arc a — (u, v) write R[a) to denote the set of leaves below v. Use 
fr(N) to denote the number of triplets in T consistent with N; and gr(N, Z) to denote the number of 
triplets in T consistent with N and that are not of the form xy\z with z £ Z and x,y ^ Z. Write f(N) 
as short for fr(N) and g(N, Z) for gr(X, Z). It will become clear later that the definition of g ensures 
that combining networks that are optimal w.r.t. g leads to networks optimal w.r.t. /. 

The algorithm works as follows. Loop through all subsets L' C L in increasing cardinality and 
consider each tripartition tt(L') = (X, Y, Z) with X, Y ^ 0. While the roles of X and Y are symmetric, 
this is not the case for X and Z, and Y and Z . The following networks have been computed in previous 
iterations of the algorithm: 

• A network N x maximizing f(N) over all level- 1 networks N with L(N) = X; 

• A network N Y maximizing f(N) over all level- 1 networks N with L(N) = Y; 

• A network N xz maximizing g(N, Z) over all level- 1 networks N with L(N) = X U Z that contain 
an n.c.r.-arc a with Z = R[a\; 

• a network N YZ maximizing g(N, Z) over all level- 1 networks N with L(N) — Y U Z that contain 
an n.c.r.-arc a with Z = R[a\. 

If Z = 0, combine N x and N Y into a new network iV 2 by adding a new root and connecting it to the 
roots of N x and N Y . If Z ^ 0, proceed as follows: 

(1) Combine N xz and N YZ into a new network by creating a "non-skew" cycle as follows. Add 
a new root and connect it to the roots of N xz and N YZ . Let a — (u,v) and a' = (u',v') be the 
(unique) n.c.r.-arcs such that Z = R[a] in N xz and Z = R[a'] in N YZ . Subdivide a into (u,w) and 
(w, v), delete v' and all arcs and vertices reachable from v' , and add an arc (u',w) (see Fig. 8); 

(2) Combine N x and N YZ into a new network N% by adding a new root and connecting it to the roots 
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of N x and N YZ ; 

(3) Create N% from N 2 by creating a "skew" cycle as follows: let a = (u, v) be the (unique) n.c.r.-arc 
with Z = R[a]. Subdivide a into (u, w) and (w, v), add a new root and connect it to the old root 
and to w (see Fig. 9). 

Let N(L') be a network that maximizes f(N) over the networks N%, N% and N% over all tripartitions 
n(L'). In addition, for each Z C L', let N 2 (L',Z) be a network that maximizes g(N 2 ,Z) over the 
networks N 2 over all tripartitions ir(L') = (X, Y, Z) with Z = Z. This concludes the description of the 
algorithm. 

Because the arcs a = (u,v) and a' = (u',v') in steps (1) and (3) are n.c.r.-arcs, we know that (in 
N xz and N YZ ) neither u, nor u' , nor one of their ancestors is contained in a cycle. It follows that 
the newly created cycles do not overlap with any of the original cycles and hence that the constructed 
networks are indeed level- 1 networks. It now also becomes clear why networks N xz and N YZ are used 
that are optimal w.r.t. g (rather than /). The creation of a new cycle, as in Fig. 8 and Fig. 9, causes 
all triplets of the form xy\z with z£Z and x, y <£ Z to become consistent with the network. 

We claim that N(L') maximizes f(N) over all level- 1 networks N with L(N) = L' . This implies 
that, in each iteration of the algorithm, the networks N x and N Y have indeed been computed in a 
previous iteration. This claim also implies that the algorithm finds an optimal solution. 

In addition, we claim that N 2 (L' , Z) maximizes g(N, Z) over all level- 1 networks N with L(N) = L' 
that contain an n.c.r.-arc a with Z = R[a\. This implies that, in each iteration, the networks N xz and 
N YZ have indeed been computed in a previous iteration of the algorithm. 

The above claims are proved by induction on the size of L'. They do hold for sets L' with \L'\ < 3; so 
given some leaf set V with \L'\ > 3 assume that the above statements hold for all leaf sets of smaller size. 
We will show that the statements are then also true for L'. Observe that from the induction hypothesis 
follows that we may take N(X) to be N x , which has hence indeed be computed in a previous iteration of 
the algorithm. Similarly, we may take N 2 (X(JZ, Z) and N 2 (Y(JZ, Z) to be N xz and N YZ , respectively, 
which have also indeed been computed in a previous iteration of the algorithm. The induction step then 
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follows from the following two lemmas. 

Lemma 4. For every Z ^ I, the network N 2 {L',Z) maximizes g(N,Z) over all level-1 networks N 
with L(N) = L' that contain an n.c.r.-arc a with Z — R[a\. 

Proof. Let N' be a network with L(N) = V and some n.c.r.-arc a such that Z = R[a\. We show 
that g(N',Z) < g(N 2 (L' , Z), Z). Because N' contains the n.c.r.-arc a, the root of N' is not in a 
cycle. Let a\ and a 2 be the two cut-arcs leaving the root such that the leaves in Z are reachable from 
d2. Let X' — R[ai] and Y' = R[a 2 ] \ Z, see Fig. 10(a). Because N 2 (L',Z) maximizes g(N 2 ,Z) over 
all tripartitions (X,Y,Z) of L' with Z = Z, it is certainly at least as good as N? x , Y , z y That is, 
g(N 2 (L', Z),Z) > g(N 2 x , y , z y Z). Write N 2 ' as short for N? x , y , Compare triplets consistent with 
N' , with those consistent with N 2 '. 

• There are at least as many triplets in T\x> consistent with N 2 ' as with N' , because N x is a 
subgraph of N 2 ' and N x maximizes f(N) over all networks N with L(N) = X' . 

• There are at least as many triplets in T\ryiuZ) that are not of the form y\y^\z for yi, ?/2 € Y' and 
z 6 Z that are consistent with N 2 ' as with N', because N Y z is a subgraph of N 2 ' and N Y z 
maximizes g(N, Z) over all networks N with L(N) = Y' U Z that contain an n.c.r.-arc a with 
Z = R[a}. 

• All triplets of the form ab\c with a, b E X' , c E Y' U Z or a, b E Y' U Z, c E X' are consistent with 
both N 21 and N'. 

• All triplets of the form ab\c with a,c E X' , b E Y' U Z or a, c E Y' U Z, b E X' are consistent with 
neither N 2 ' nor N' . 

Thus g(N', Z) < g(N 2 ', Z) = g{N 2 {L' Z).Z). □ 

Lemma 5. The network N(L') maximizes f(N) over all level-1 networks N with L(N) = TJ . 

Proof. For contradiction, suppose that some network N' ^ N(L') with L(N') — L' is consistent with 
more triplets in T than N(L'). Distinguish three cases, depending on the shape of N'. 

The first case is that the two arcs leaving the root of N' are cut-arcs ai and 02- Let X' = R[ai], 
Y' = R[a 2 ] and Z' = 0, sec Fig. 10(b), and compare N' to A ( 2 X , yl z , y 

The latter network is consistent with at least as many triplets from T\x> because it contains N x as 
a subnetwork, and N x maximizes f(N) over all networks N with L(N) = X' . Similarly, the network 
N 2 X , Y , z ,j is consistent with as least as many triplets from T\y as N'. All other triplets are either 
consistent with both or with none of these networks. Hence N 2 X , Y , Z i\ is consistent with at least as 
many triplets as N' . Because N(L') is consistent with at least as many triplets as N 2 X , Y , z ,y it follows 
that N(L') is also consistent with at least as many triplets as TV'; a contradiction. 

The second case is that one child of the root of N' is a reticulation vertex. Let ai — (r, V\) and 
a 2 = ( r i V2) be the two arcs leaving the root of N' and suppose that i> 2 is a reticulation vertex. Let a$ 
and 04 be the two arcs leaving v\. Because N' is a level-1 network, one of 03, 04 is a cut-arc, say a^. Let 
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Figure 11. Shapes of networks, referred to as network N' in the proof of Lemma 5. 



X 1 = R[a 3 ], Y 1 — R[a^\ \i?[a 2 ] and Z' = i?[a 2 ], see Fig. 11(a). Compare the networks N' and N^ x , Y , z ,-. 
with respect to the number of triplets in T these networks are consistent with. First, consider triplets 
in T\x>- Network N? x , Y , z ,< ) is consistent with at least as many of these as N' , because it contains N 
as a subgraph. Second, consider triplets in T\^yuz') that are not of the form yiy 2 \z for y\, y 2 € Y and 
z € Z. We will show that N^ x , Y , z ,s is consistent with at least as many of these triplets as N'. First 

recall that N^ x , Y , z ,-. contains a subdivision of N Y z , which maximizes g(N, Z') over all networks 
with L(N) =Y'UZ' containing an n.c.r.-arc a with Z' — R[a]. The network N' does not contain such 
an n.c.r.-arc, but we will modify it to a network that does contain such an n.c.r.-arc and is consistent 
with the same number of the considered triplets. Let TV" be the network N' with the arc a 2 removed. 
Observe that g(N",Z') = g(N',Z'), and so it follows that N^ x , Y , z ,s is consistent with at least as 
many of the considered triplets as N'. All other triplets are either consistent with both N^ x , Y , z ,y N' 
or with none, since both networks have the structure from Fig. 11(a): only the internal structure inside 
X' , Y' and Z' might be different in the two networks. Hence N? x , Y , z ,s is consistent with at least as 
many triplets as N' . Because N(L') is consistent with at least as many triplets as N? x , Y , z ,^ it follows 
that N(L') is also consistent with at least as many triplets as JV'; a contradiction. 

The last case is that the two arcs a\ and a 2 leaving the root of N' are not cut-arcs and are also 
not leading to reticulation vertices. Let X' = R[ai] \ R[a 2 ], Y' — R[a 2 ] \ R[ai] and Z' — R[ai] n i?[a 2 ], 
see Fig. 11(b). Compare the networks N' and N^ x , Y , z ,y with respect to the number of triplets in T 
these networks are consistent with. First, consider triplets in T\ix'\JZ') that are not of the form x\X2\z 
for x,\, x 2 £ X' and z E Z' . We will show that N} x , Y , z ,s is consistent with at least as many of these 
triplets as N' . Recall that N^ x , yl z „ contains a subdivision of N x z , which maximizes g(N, Z') over 
all networks with L(N) = X' U Z' containing an n.c.r.-arc a with Z' — R[a\. The network TV' does not 
contain such an n.c.r.-arc, but we will modify it to a network that does contain such an n.c.r.-arc and 
is consistent with the same number of the considered triplets. Let a = (u, v) be the cut-arc in N' with 
Z' = R[a], and let a' be the arc that leads to u and is reachable from a 2 . Let N" be the network N' 
with the arc a' removed. Now N" is consistent with the same number of the considered triplets as TV', 
and so it follows that N^ x , Y , z ,^ is consistent with at least as many of the considered triplets as N'. 
In a similar way it follows that N^ x , Y , z ,- ) is consistent with as least as many triplets in T\(y'uz>) that 
are not of the form yiy 2 \z for yx,yi € Y' and z 6 Z'). All other triplets are either consistent with both 
networks or with none. Hence N} x , Y , z ,s is consistent with at least as many triplets as N' . Because 
N(L') is consistent with at least as many triplets as N^ x , Y , z ,* it follows that N(L') is also consistent 
with at least as many triplets as TV'; a contradiction. □ 



Theorem 4. Given a set T of m triplets over n leaves, a level-1 network consistent with a maximum 
number of triplets in T can be constructed in 0(mA n ) time and 0(n3 n ) space. 
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Proof. To achieve a small polynomial factor in the complexity, we use dynamic programming to com- 
pute the optimal value of the solution as well as the partitions we have to choose in each step. Then 
a traceback algorithm constructs a network consistent with the maximum number of triplets. To be 
precise, the dynamic programming algorithm finds, for all L' C L, the maximum number f(L') of 
triplets in T consistent with a level- 1 network with leaves L' C L. It also computes, for all Z c L' , the 
maximum value g(L' , Z) of g(N, Z) over all level- 1 networks N with leaves L' that contain an n.c.r.-arc 
a with Z = R[a). The algorithm loops through all the subsets L'CL from small to large and considers 
all tripartitions n(L') = (X, Y, Z). For each such partition, the values f(X), f(Y), f(Z), g(X U Z, Z) 
and g(YUZ, Z) are readily available from previous iterations. To compute the values f(L') and g(L' , Z) 
it only remains to count certain triplets in T, whose consistency with a network only depends on the 
tripartition (X, Y, Z) and the network type (N*, iV 2 or N%). This can be done by first checking mem- 
bership of X, Y and Z for each leaf in V (in 0(n) time) and then looping through all triplets only 
once. Hence this counting can be done in 0(n + m) = 0{m) time. The algorithm's overall running time 
is thus bounded by 0{m) £™ = i (™)0 (3*) = 0(m4"). 

For each leaf set L' C L, store the optimal tripartition and the optimal type of network (N%, N% 
or N%). In addition, store an optimal bipartition for all V C L and Z C V '. This yields a total space 
complexity of 0(n3 n ). 

Once the values f(L') and g{V , Z) have been computed and all optimal tripartitions and biparti- 
tions have been stored, a level- 1 network N consistent with f(L) many triplets can be constructed by 
traceback, in polynomial time. Optimality of the algorithm follows from Lemmas 4 and 5. □ 

7. Open Problems 

The obvious question to ask is whether the 0(m4") running time of our exact algorithm in Sect. 6 can 
be improved. The same question can be asked about the 0(3™(n 2 + m)) algorithm 29 for maximum 
consistent trees. It would also be interesting to extend the exact approach to the construction of level- 2 
networks, provided that reasonable running times can be achieved. 

Positive results for level-3 and higher networks have so far remained out of reach. In light of 65 simple 
lcvcl-3 generators 18 , we fear that algorithms for constructing level-3 networks (and higher) will almost 
certainly not be possible by using approaches similar to the ones in Sect. 6. A similar statement holds 
for the dense level- 2 case 28 , since the devised algorithms explicitly distinguish between the structures 
of different lcvcl-fc generators. Tantalisingly, however, it remains a possibility that for each k > it is 
polynomial-time solvable to determine whether there is a level- A; network consistent with a dense set of 
input triplets. 

Approximability of the MAxCL-fc problem needs to be further explored. APX-complctcness of 
MaxCL-0 is known 6 , hence no Polynomial Time Approximation Scheme for MaxCL-0 is possible 
unless P = NP. It would be interesting to extend this result to k > 0. On the other hand, the best 
known approximation ratios are | for MaxCL-0 10 and 0.48 for MaxCL-1 6 , leaving (potentially) 
much room for improvement. 

From a more practical point of view, it is worthwhile to study the actual level of real evolutionary 
histories. This will tell for which values of k it remains important to design algorithms that construct 
level- k networks. 
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